Dataset statistics
| Number of variables | 17 |
|---|---|
| Number of observations | 165600 |
| Missing cells | 151588 |
| Missing cells (%) | 5.4% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 21.5 MiB |
| Average record size in memory | 136.0 B |
Variable types
| NUM | 12 |
|---|---|
| CAT | 5 |
Reproduction
| Analysis started | 2020-11-16 03:27:33.269262 |
|---|---|
| Analysis finished | 2020-11-16 03:28:24.262472 |
| Duration | 50.99 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
Zipcode has a high cardinality: 33120 distinct values | High cardinality |
State has a high cardinality: 51 distinct values | High cardinality |
City has a high cardinality: 14740 distinct values | High cardinality |
Metro has a high cardinality: 861 distinct values | High cardinality |
CountyName has a high cardinality: 1759 distinct values | High cardinality |
Year is highly correlated with df_index and 5 other fields | High correlation |
df_index is highly correlated with Year and 5 other fields | High correlation |
int_rate is highly correlated with df_index and 2 other fields | High correlation |
med_hIncome is highly correlated with df_index and 4 other fields | High correlation |
uspop_growth is highly correlated with int_rate | High correlation |
unemplt_rate is highly correlated with df_index and 4 other fields | High correlation |
newHouse_starts is highly correlated with df_index and 4 other fields | High correlation |
resConstruct_spending is highly correlated with df_index and 4 other fields | High correlation |
RentPrice has 12081 (7.3%) missing values | Missing |
SizeRank has 16910 (10.2%) missing values | Missing |
State has 16910 (10.2%) missing values | Missing |
City has 16910 (10.2%) missing values | Missing |
Metro has 51880 (31.3%) missing values | Missing |
CountyName has 16910 (10.2%) missing values | Missing |
HomePrice has 19987 (12.1%) missing values | Missing |
Zipcode is uniformly distributed | Uniform |
df_index has unique values | Unique |
Vacancy_Rate% has 8967 (5.4%) zeros | Zeros |
| Distinct count | 165600 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 182159.5 |
|---|---|
| Minimum | 99360 |
| Maximum | 264959 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 99360 |
|---|---|
| 5-th percentile | 107639.95 |
| Q1 | 140759.75 |
| median | 182159.5 |
| Q3 | 223559.25 |
| 95-th percentile | 256679.05 |
| Maximum | 264959 |
| Range | 165599 |
| Interquartile range (IQR) | 82799.5 |
Descriptive statistics
| Standard deviation | 47804.74663 |
|---|---|
| Coefficient of variation (CV) | 0.2624334532 |
| Kurtosis | -1.2 |
| Mean | 182159.5 |
| Median Absolute Deviation (MAD) | 41400 |
| Skewness | 0 |
| Sum | 3.01656132e+10 |
| Variance | 2285293800 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 264191 | 1 | < 0.1% | |
| 120208 | 1 | < 0.1% | |
| 132458 | 1 | < 0.1% | |
| 138601 | 1 | < 0.1% | |
| 136552 | 1 | < 0.1% | |
| 159079 | 1 | < 0.1% | |
| 157030 | 1 | < 0.1% | |
| 163173 | 1 | < 0.1% | |
| 161124 | 1 | < 0.1% | |
| 150883 | 1 | < 0.1% | |
| Other values (165590) | 165590 | > 99.9% |
| Value | Count | Frequency (%) | |
| 99360 | 1 | < 0.1% | |
| 99361 | 1 | < 0.1% | |
| 99362 | 1 | < 0.1% | |
| 99363 | 1 | < 0.1% | |
| 99364 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 264959 | 1 | < 0.1% | |
| 264958 | 1 | < 0.1% | |
| 264957 | 1 | < 0.1% | |
| 264956 | 1 | < 0.1% | |
| 264955 | 1 | < 0.1% |
| Distinct count | 33120 |
|---|---|
| Unique (%) | 20.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.3 MiB |
| 17021 | 5 |
|---|---|
| 86511 | 5 |
| 58051 | 5 |
| 60901 | 5 |
| 08550 | 5 |
| Other values (33115) |
| Value | Count | Frequency (%) | |
| 17021 | 5 | < 0.1% | |
| 86511 | 5 | < 0.1% | |
| 58051 | 5 | < 0.1% | |
| 60901 | 5 | < 0.1% | |
| 08550 | 5 | < 0.1% | |
| 92250 | 5 | < 0.1% | |
| 60631 | 5 | < 0.1% | |
| 17360 | 5 | < 0.1% | |
| 60478 | 5 | < 0.1% | |
| 50029 | 5 | < 0.1% | |
| Other values (33110) | 165550 | > 99.9% |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 5 |
| Min length | 5 |
| Distinct count | 99354 |
|---|---|
| Unique (%) | 64.7% |
| Missing | 12081 |
| Missing (%) | 7.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1068.026284140725 |
|---|---|
| Minimum | 19.960000000000036 |
| Maximum | 5620.320000000002 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 19.96 |
|---|---|
| 5-th percentile | 568.139 |
| Q1 | 800.07 |
| median | 964.99 |
| Q3 | 1220.73 |
| 95-th percentile | 1892.3805 |
| Maximum | 5620.32 |
| Range | 5600.36 |
| Interquartile range (IQR) | 420.66 |
Descriptive statistics
| Standard deviation | 446.1696767 |
|---|---|
| Coefficient of variation (CV) | 0.4177515884 |
| Kurtosis | 10.77533879 |
| Mean | 1068.026284 |
| Median Absolute Deviation (MAD) | 197 |
| Skewness | 2.339095842 |
| Sum | 163962327.1 |
| Variance | 199067.3804 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1068.85 | 196 | 0.1% | |
| 1339.99 | 190 | 0.1% | |
| 1094.96 | 189 | 0.1% | |
| 1064.99 | 188 | 0.1% | |
| 1011.56 | 187 | 0.1% | |
| 1068.325 | 187 | 0.1% | |
| 1343.85 | 182 | 0.1% | |
| 839.99 | 179 | 0.1% | |
| 1286.56 | 177 | 0.1% | |
| 1343.325 | 177 | 0.1% | |
| Other values (99344) | 151667 | 91.6% | |
| (Missing) | 12081 | 7.3% |
| Value | Count | Frequency (%) | |
| 19.96 | 7 | < 0.1% | |
| 94.96 | 8 | < 0.1% | |
| 103.29 | 1 | < 0.1% | |
| 133.95 | 1 | < 0.1% | |
| 139.4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 5620.32 | 5 | < 0.1% | |
| 5619.795 | 2 | < 0.1% | |
| 5616.46 | 3 | < 0.1% | |
| 5563.03 | 2 | < 0.1% | |
| 5410.42 | 1 | < 0.1% |
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2016.0 |
|---|---|
| Minimum | 2014 |
| Maximum | 2018 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 2014 |
|---|---|
| 5-th percentile | 2014 |
| Q1 | 2015 |
| median | 2016 |
| Q3 | 2017 |
| 95-th percentile | 2018 |
| Maximum | 2018 |
| Range | 4 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.414217832 |
|---|---|
| Coefficient of variation (CV) | 0.0007014969407 |
| Kurtosis | -1.300003019 |
| Mean | 2016 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0 |
| Sum | 333849600 |
| Variance | 2.000012077 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2018 | 33120 | 20.0% | |
| 2017 | 33120 | 20.0% | |
| 2016 | 33120 | 20.0% | |
| 2015 | 33120 | 20.0% | |
| 2014 | 33120 | 20.0% |
| Value | Count | Frequency (%) | |
| 2014 | 33120 | 20.0% | |
| 2015 | 33120 | 20.0% | |
| 2016 | 33120 | 20.0% | |
| 2017 | 33120 | 20.0% | |
| 2018 | 33120 | 20.0% |
| Value | Count | Frequency (%) | |
| 2018 | 33120 | 20.0% | |
| 2017 | 33120 | 20.0% | |
| 2016 | 33120 | 20.0% | |
| 2015 | 33120 | 20.0% | |
| 2014 | 33120 | 20.0% |
| Distinct count | 11054 |
|---|---|
| Unique (%) | 7.4% |
| Missing | 16910 |
| Missing (%) | 10.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15646.706806106666 |
|---|---|
| Minimum | 0.0 |
| Maximum | 34430.0 |
| Zeros | 5 |
| Zeros (%) | < 0.1% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1503 |
| Q1 | 7531 |
| median | 15164 |
| Q3 | 23514 |
| 95-th percentile | 31180 |
| Maximum | 34430 |
| Range | 34430 |
| Interquartile range (IQR) | 15983 |
Descriptive statistics
| Standard deviation | 9424.136486 |
|---|---|
| Coefficient of variation (CV) | 0.6023079874 |
| Kurtosis | -1.122785048 |
| Mean | 15646.70681 |
| Median Absolute Deviation (MAD) | 7971.5 |
| Skewness | 0.1321161222 |
| Sum | 2326508835 |
| Variance | 88814348.51 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 29964 | 180 | 0.1% | |
| 32062 | 175 | 0.1% | |
| 30545 | 175 | 0.1% | |
| 29685 | 155 | 0.1% | |
| 30892 | 155 | 0.1% | |
| 28904 | 155 | 0.1% | |
| 28097 | 155 | 0.1% | |
| 29439 | 150 | 0.1% | |
| 30096 | 145 | 0.1% | |
| 30504 | 145 | 0.1% | |
| Other values (11044) | 147100 | 88.8% | |
| (Missing) | 16910 | 10.2% |
| Value | Count | Frequency (%) | |
| 0 | 5 | < 0.1% | |
| 1 | 5 | < 0.1% | |
| 2 | 5 | < 0.1% | |
| 3 | 5 | < 0.1% | |
| 4 | 5 | < 0.1% |
| Value | Count | Frequency (%) | |
| 34430 | 115 | 0.1% | |
| 34322 | 80 | < 0.1% | |
| 34302 | 15 | < 0.1% | |
| 34272 | 5 | < 0.1% | |
| 34258 | 5 | < 0.1% |
| Distinct count | 51 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 16910 |
| Missing (%) | 10.2% |
| Memory size | 1.3 MiB |
| TX | 8800 |
|---|---|
| NY | 8490 |
| CA | 8315 |
| PA | 8155 |
| IL | 6345 |
| Other values (46) |
| Value | Count | Frequency (%) | |
| TX | 8800 | 5.3% | |
| NY | 8490 | 5.1% | |
| CA | 8315 | 5.0% | |
| PA | 8155 | 4.9% | |
| IL | 6345 | 3.8% | |
| OH | 5800 | 3.5% | |
| FL | 4715 | 2.8% | |
| MI | 4690 | 2.8% | |
| MO | 4645 | 2.8% | |
| IA | 4615 | 2.8% | |
| Other values (41) | 84120 | 50.8% | |
| (Missing) | 16910 | 10.2% |
Length
| Max length | 3 |
|---|---|
| Median length | 2 |
| Mean length | 2.102113527 |
| Min length | 2 |
| Distinct count | 14740 |
|---|---|
| Unique (%) | 9.9% |
| Missing | 16910 |
| Missing (%) | 10.2% |
| Memory size | 1.3 MiB |
| New York | 855 |
|---|---|
| Houston | 535 |
| Los Angeles | 500 |
| San Antonio | 280 |
| Chicago | 275 |
| Other values (14735) |
| Value | Count | Frequency (%) | |
| New York | 855 | 0.5% | |
| Houston | 535 | 0.3% | |
| Los Angeles | 500 | 0.3% | |
| San Antonio | 280 | 0.2% | |
| Chicago | 275 | 0.2% | |
| Springfield | 270 | 0.2% | |
| Dallas | 260 | 0.2% | |
| Columbus | 255 | 0.2% | |
| Kansas City | 250 | 0.2% | |
| Philadelphia | 245 | 0.1% | |
| Other values (14730) | 144965 | 87.5% | |
| (Missing) | 16910 | 10.2% |
Length
| Max length | 30 |
|---|---|
| Median length | 8 |
| Mean length | 8.448641304 |
| Min length | 3 |
| Distinct count | 861 |
|---|---|
| Unique (%) | 0.8% |
| Missing | 51880 |
| Missing (%) | 31.3% |
| Memory size | 1.3 MiB |
| New York-Newark-Jersey City | 4635 |
|---|---|
| Chicago-Naperville-Elgin | 1910 |
| Los Angeles-Long Beach-Anaheim | 1810 |
| Philadelphia-Camden-Wilmington | 1770 |
| Washington-Arlington-Alexandria | 1595 |
| Other values (856) |
| Value | Count | Frequency (%) | |
| New York-Newark-Jersey City | 4635 | 2.8% | |
| Chicago-Naperville-Elgin | 1910 | 1.2% | |
| Los Angeles-Long Beach-Anaheim | 1810 | 1.1% | |
| Philadelphia-Camden-Wilmington | 1770 | 1.1% | |
| Washington-Arlington-Alexandria | 1595 | 1.0% | |
| Pittsburgh | 1590 | 1.0% | |
| Boston-Cambridge-Newton | 1380 | 0.8% | |
| Dallas-Fort Worth-Arlington | 1320 | 0.8% | |
| Houston-The Woodlands-Sugar Land | 1180 | 0.7% | |
| Minneapolis-St. Paul-Bloomington | 1150 | 0.7% | |
| Other values (851) | 95380 | 57.6% | |
| (Missing) | 51880 | 31.3% |
Length
| Max length | 42 |
|---|---|
| Median length | 9 |
| Mean length | 12.30697464 |
| Min length | 3 |
| Distinct count | 1759 |
|---|---|
| Unique (%) | 1.2% |
| Missing | 16910 |
| Missing (%) | 10.2% |
| Memory size | 1.3 MiB |
| Washington County | 1780 |
|---|---|
| Jefferson County | 1625 |
| Los Angeles County | 1375 |
| Franklin County | 1325 |
| Montgomery County | 1325 |
| Other values (1754) |
| Value | Count | Frequency (%) | |
| Washington County | 1780 | 1.1% | |
| Jefferson County | 1625 | 1.0% | |
| Los Angeles County | 1375 | 0.8% | |
| Franklin County | 1325 | 0.8% | |
| Montgomery County | 1325 | 0.8% | |
| Jackson County | 1165 | 0.7% | |
| Orange County | 1100 | 0.7% | |
| Marion County | 910 | 0.5% | |
| Wayne County | 880 | 0.5% | |
| Monroe County | 880 | 0.5% | |
| Other values (1749) | 136325 | 82.3% | |
| (Missing) | 16910 | 10.2% |
Length
| Max length | 29 |
|---|---|
| Median length | 14 |
| Mean length | 13.07388285 |
| Min length | 3 |
| Distinct count | 142642 |
|---|---|
| Unique (%) | 98.0% |
| Missing | 19987 |
| Missing (%) | 12.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 194823.4147800677 |
|---|---|
| Minimum | 11860.83 |
| Maximum | 6141945.92 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 11860.83 |
|---|---|
| 5-th percentile | 51255.216 |
| Q1 | 90546.75 |
| median | 139991.42 |
| Q3 | 225666.83 |
| 95-th percentile | 512356.518 |
| Maximum | 6141945.92 |
| Range | 6130085.09 |
| Interquartile range (IQR) | 135120.08 |
Descriptive statistics
| Standard deviation | 201176.0974 |
|---|---|
| Coefficient of variation (CV) | 1.032607388 |
| Kurtosis | 66.01927228 |
| Mean | 194823.4148 |
| Median Absolute Deviation (MAD) | 59113.25 |
| Skewness | 5.768940617 |
| Sum | 2.83688219e+10 |
| Variance | 4.047182217e+10 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 57537.17 | 4 | < 0.1% | |
| 75169.83 | 4 | < 0.1% | |
| 50318.58 | 3 | < 0.1% | |
| 57930.5 | 3 | < 0.1% | |
| 103050.08 | 3 | < 0.1% | |
| 64628.33 | 3 | < 0.1% | |
| 132820.5 | 3 | < 0.1% | |
| 49579 | 3 | < 0.1% | |
| 59689.42 | 3 | < 0.1% | |
| 54673.67 | 3 | < 0.1% | |
| Other values (142632) | 145581 | 87.9% | |
| (Missing) | 19987 | 12.1% |
| Value | Count | Frequency (%) | |
| 11860.83 | 1 | < 0.1% | |
| 12147.5 | 1 | < 0.1% | |
| 12309.92 | 1 | < 0.1% | |
| 13387.33 | 1 | < 0.1% | |
| 13546.67 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 6141945.92 | 1 | < 0.1% | |
| 5373670.92 | 1 | < 0.1% | |
| 5197037.17 | 1 | < 0.1% | |
| 4928414.67 | 1 | < 0.1% | |
| 4771183.92 | 1 | < 0.1% |
| Distinct count | 116581 |
|---|---|
| Unique (%) | 70.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 17.905033346462375 |
|---|---|
| Minimum | 0.0 |
| Maximum | 100.0 |
| Zeros | 8967 |
| Zeros (%) | 5.4% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 7.014507465 |
| median | 13.0191052 |
| Q3 | 22.92878031 |
| 95-th percentile | 53.21844737 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range (IQR) | 15.91427285 |
Descriptive statistics
| Standard deviation | 16.57333207 |
|---|---|
| Coefficient of variation (CV) | 0.9256241947 |
| Kurtosis | 4.447530561 |
| Mean | 17.90503335 |
| Median Absolute Deviation (MAD) | 7.094430238 |
| Skewness | 1.932858015 |
| Sum | 2965073.522 |
| Variance | 274.675336 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 8967 | 5.4% | |
| 100 | 432 | 0.3% | |
| 25 | 199 | 0.1% | |
| 20 | 195 | 0.1% | |
| 33.33333333 | 180 | 0.1% | |
| 16.66666667 | 161 | 0.1% | |
| 14.28571429 | 133 | 0.1% | |
| 12.5 | 131 | 0.1% | |
| 50 | 124 | 0.1% | |
| 11.11111111 | 111 | 0.1% | |
| Other values (116571) | 154967 | 93.6% |
| Value | Count | Frequency (%) | |
| 0 | 8967 | 5.4% | |
| 0.02272727273 | 1 | < 0.1% | |
| 0.1114827202 | 1 | < 0.1% | |
| 0.1248439451 | 1 | < 0.1% | |
| 0.1402524544 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 100 | 432 | 0.3% | |
| 99.83974359 | 1 | < 0.1% | |
| 99.71791255 | 1 | < 0.1% | |
| 99.65337955 | 1 | < 0.1% | |
| 99.57386364 | 1 | < 0.1% |
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.325 |
|---|---|
| Minimum | 0.75 |
| Maximum | 2.458333333333333 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 0.75 |
|---|---|
| 5-th percentile | 0.75 |
| Q1 | 0.7708333333 |
| median | 1.020833333 |
| Q3 | 1.625 |
| 95-th percentile | 2.458333333 |
| Maximum | 2.458333333 |
| Range | 1.708333333 |
| Interquartile range (IQR) | 0.8541666667 |
Descriptive statistics
| Standard deviation | 0.6487989226 |
|---|---|
| Coefficient of variation (CV) | 0.4896595642 |
| Kurtosis | -0.8891515948 |
| Mean | 1.325 |
| Median Absolute Deviation (MAD) | 0.2708333333 |
| Skewness | 0.8013670211 |
| Sum | 219420 |
| Variance | 0.4209400419 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1.020833333 | 33120 | 20.0% | |
| 0.7708333333 | 33120 | 20.0% | |
| 2.458333333 | 33120 | 20.0% | |
| 1.625 | 33120 | 20.0% | |
| 0.75 | 33120 | 20.0% |
| Value | Count | Frequency (%) | |
| 0.75 | 33120 | 20.0% | |
| 0.7708333333 | 33120 | 20.0% | |
| 1.020833333 | 33120 | 20.0% | |
| 1.625 | 33120 | 20.0% | |
| 2.458333333 | 33120 | 20.0% |
| Value | Count | Frequency (%) | |
| 2.458333333 | 33120 | 20.0% | |
| 1.625 | 33120 | 20.0% | |
| 1.020833333 | 33120 | 20.0% | |
| 0.7708333333 | 33120 | 20.0% | |
| 0.75 | 33120 | 20.0% |
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 61994.2 |
|---|---|
| Minimum | 58001.0 |
| Maximum | 64324.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 58001 |
|---|---|
| 5-th percentile | 58001 |
| Q1 | 60987 |
| median | 62898 |
| Q3 | 63761 |
| 95-th percentile | 64324 |
| Maximum | 64324 |
| Range | 6323 |
| Interquartile range (IQR) | 2774 |
Descriptive statistics
| Standard deviation | 2294.631202 |
|---|---|
| Coefficient of variation (CV) | 0.03701364325 |
| Kurtosis | -0.8706176326 |
| Mean | 61994.2 |
| Median Absolute Deviation (MAD) | 1426 |
| Skewness | -0.7581061267 |
| Sum | 1.026623952e+10 |
| Variance | 5265332.355 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 64324 | 33120 | 20.0% | |
| 63761 | 33120 | 20.0% | |
| 62898 | 33120 | 20.0% | |
| 60987 | 33120 | 20.0% | |
| 58001 | 33120 | 20.0% |
| Value | Count | Frequency (%) | |
| 58001 | 33120 | 20.0% | |
| 60987 | 33120 | 20.0% | |
| 62898 | 33120 | 20.0% | |
| 63761 | 33120 | 20.0% | |
| 64324 | 33120 | 20.0% |
| Value | Count | Frequency (%) | |
| 64324 | 33120 | 20.0% | |
| 63761 | 33120 | 20.0% | |
| 62898 | 33120 | 20.0% | |
| 60987 | 33120 | 20.0% | |
| 58001 | 33120 | 20.0% |
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.6656347077111499 |
|---|---|
| Minimum | 0.5223373578996761 |
| Maximum | 0.730641178178307 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 0.5223373579 |
|---|---|
| 5-th percentile | 0.5223373579 |
| Q1 | 0.6310078932 |
| median | 0.7166694134 |
| Q3 | 0.7275176958 |
| 95-th percentile | 0.7306411782 |
| Maximum | 0.7306411782 |
| Range | 0.2083038203 |
| Interquartile range (IQR) | 0.09650980259 |
Descriptive statistics
| Standard deviation | 0.08049003536 |
|---|---|
| Coefficient of variation (CV) | 0.1209222332 |
| Kurtosis | -0.7966578346 |
| Mean | 0.6656347077 |
| Median Absolute Deviation (MAD) | 0.01397176475 |
| Skewness | -0.8972530783 |
| Sum | 110229.1076 |
| Variance | 0.006478645793 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0.6310078932 | 33120 | 20.0% | |
| 0.7166694134 | 33120 | 20.0% | |
| 0.7306411782 | 33120 | 20.0% | |
| 0.5223373579 | 33120 | 20.0% | |
| 0.7275176958 | 33120 | 20.0% |
| Value | Count | Frequency (%) | |
| 0.5223373579 | 33120 | 20.0% | |
| 0.6310078932 | 33120 | 20.0% | |
| 0.7166694134 | 33120 | 20.0% | |
| 0.7275176958 | 33120 | 20.0% | |
| 0.7306411782 | 33120 | 20.0% |
| Value | Count | Frequency (%) | |
| 0.7306411782 | 33120 | 20.0% | |
| 0.7275176958 | 33120 | 20.0% | |
| 0.7166694134 | 33120 | 20.0% | |
| 0.6310078932 | 33120 | 20.0% | |
| 0.5223373579 | 33120 | 20.0% |
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.908333333333333 |
|---|---|
| Minimum | 3.8916666666666666 |
| Maximum | 6.158333333333332 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 3.891666667 |
|---|---|
| 5-th percentile | 3.891666667 |
| Q1 | 4.341666667 |
| median | 4.875 |
| Q3 | 5.275 |
| 95-th percentile | 6.158333333 |
| Maximum | 6.158333333 |
| Range | 2.266666667 |
| Interquartile range (IQR) | 0.9333333333 |
Descriptive statistics
| Standard deviation | 0.7813829039 |
|---|---|
| Coefficient of variation (CV) | 0.1591951587 |
| Kurtosis | -1.051948045 |
| Mean | 4.908333333 |
| Median Absolute Deviation (MAD) | 0.5333333333 |
| Skewness | 0.3226278116 |
| Sum | 812820 |
| Variance | 0.6105592425 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 5.275 | 33120 | 20.0% | |
| 4.341666667 | 33120 | 20.0% | |
| 6.158333333 | 33120 | 20.0% | |
| 4.875 | 33120 | 20.0% | |
| 3.891666667 | 33120 | 20.0% |
| Value | Count | Frequency (%) | |
| 3.891666667 | 33120 | 20.0% | |
| 4.341666667 | 33120 | 20.0% | |
| 4.875 | 33120 | 20.0% | |
| 5.275 | 33120 | 20.0% | |
| 6.158333333 | 33120 | 20.0% |
| Value | Count | Frequency (%) | |
| 6.158333333 | 33120 | 20.0% | |
| 5.275 | 33120 | 20.0% | |
| 4.875 | 33120 | 20.0% | |
| 4.341666667 | 33120 | 20.0% | |
| 3.891666667 | 33120 | 20.0% |
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1147.85 |
|---|---|
| Minimum | 1000.25 |
| Maximum | 1248.25 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 1000.25 |
|---|---|
| 5-th percentile | 1000.25 |
| Q1 | 1106.75 |
| median | 1176.583333 |
| Q3 | 1207.416667 |
| 95-th percentile | 1248.25 |
| Maximum | 1248.25 |
| Range | 248 |
| Interquartile range (IQR) | 100.6666667 |
Descriptive statistics
| Standard deviation | 87.09667188 |
|---|---|
| Coefficient of variation (CV) | 0.07587809547 |
| Kurtosis | -0.9412115637 |
| Mean | 1147.85 |
| Median Absolute Deviation (MAD) | 69.83333333 |
| Skewness | -0.6168959352 |
| Sum | 190083960 |
| Variance | 7585.830253 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1000.25 | 33120 | 20.0% | |
| 1176.583333 | 33120 | 20.0% | |
| 1207.416667 | 33120 | 20.0% | |
| 1248.25 | 33120 | 20.0% | |
| 1106.75 | 33120 | 20.0% |
| Value | Count | Frequency (%) | |
| 1000.25 | 33120 | 20.0% | |
| 1106.75 | 33120 | 20.0% | |
| 1176.583333 | 33120 | 20.0% | |
| 1207.416667 | 33120 | 20.0% | |
| 1248.25 | 33120 | 20.0% |
| Value | Count | Frequency (%) | |
| 1248.25 | 33120 | 20.0% | |
| 1207.416667 | 33120 | 20.0% | |
| 1176.583333 | 33120 | 20.0% | |
| 1106.75 | 33120 | 20.0% | |
| 1000.25 | 33120 | 20.0% |
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 483455.61666666646 |
|---|---|
| Minimum | 382868.33333333326 |
| Maximum | 564448.75 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.3 MiB |
Quantile statistics
| Minimum | 382868.3333 |
|---|---|
| 5-th percentile | 382868.3333 |
| Q1 | 438118.3333 |
| median | 485822.5 |
| Q3 | 546020.1667 |
| 95-th percentile | 564448.75 |
| Maximum | 564448.75 |
| Range | 181580.4167 |
| Interquartile range (IQR) | 107901.8333 |
Descriptive statistics
| Standard deviation | 67310.05916 |
|---|---|
| Coefficient of variation (CV) | 0.139226967 |
| Kurtosis | -1.392824449 |
| Mean | 483455.6167 |
| Median Absolute Deviation (MAD) | 60197.66667 |
| Skewness | -0.2195061288 |
| Sum | 8.006025012e+10 |
| Variance | 4530644065 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 546020.1667 | 33120 | 20.0% | |
| 485822.5 | 33120 | 20.0% | |
| 438118.3333 | 33120 | 20.0% | |
| 564448.75 | 33120 | 20.0% | |
| 382868.3333 | 33120 | 20.0% |
| Value | Count | Frequency (%) | |
| 382868.3333 | 33120 | 20.0% | |
| 438118.3333 | 33120 | 20.0% | |
| 485822.5 | 33120 | 20.0% | |
| 546020.1667 | 33120 | 20.0% | |
| 564448.75 | 33120 | 20.0% |
| Value | Count | Frequency (%) | |
| 564448.75 | 33120 | 20.0% | |
| 546020.1667 | 33120 | 20.0% | |
| 485822.5 | 33120 | 20.0% | |
| 438118.3333 | 33120 | 20.0% | |
| 382868.3333 | 33120 | 20.0% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| df_index | Zipcode | RentPrice | Year | SizeRank | State | City | Metro | CountyName | HomePrice | Vacancy_Rate% | int_rate | med_hIncome | uspop_growth | unemplt_rate | newHouse_starts | resConstruct_spending | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 99360 | 01062 | 989.05 | 2014 | 8993.0 | MA | Northampton | Springfield | Hampshire County | 260977.17 | 5.333333 | 0.75 | 58001.0 | 0.727518 | 6.158333 | 1000.25 | 382868.333333 |
| 1 | 99361 | 01063 | NaN | 2014 | 34430.0 | MA | Northampton | Springfield | Hampshire County | 554513.42 | 0.000000 | 0.75 | 58001.0 | 0.727518 | 6.158333 | 1000.25 | 382868.333333 |
| 2 | 99362 | 01066 | 1344.96 | 2014 | 28699.0 | MA | Hatfield | Springfield | Hampshire County | 294024.33 | 0.000000 | 0.75 | 58001.0 | 0.727518 | 6.158333 | 1000.25 | 382868.333333 |
| 3 | 99363 | 01068 | 999.00 | 2014 | 19818.0 | MA | Oakham | Worcester | Worcester County | 242939.00 | 5.465288 | 0.75 | 58001.0 | 0.727518 | 6.158333 | 1000.25 | 382868.333333 |
| 4 | 99364 | 01069 | 657.39 | 2014 | 10351.0 | MA | Palmer | Springfield | Hampden County | 181901.60 | 8.557718 | 0.75 | 58001.0 | 0.727518 | 6.158333 | 1000.25 | 382868.333333 |
| 5 | 99365 | 01070 | 752.10 | 2014 | 26235.0 | MA | Plainfield | Springfield | Hampshire County | 217112.75 | 25.602410 | 0.75 | 58001.0 | 0.727518 | 6.158333 | 1000.25 | 382868.333333 |
| 6 | 99366 | 01071 | 986.63 | 2014 | 21453.0 | MA | Russell | Springfield | Hampden County | 194616.33 | 8.925620 | 0.75 | 58001.0 | 0.727518 | 6.158333 | 1000.25 | 382868.333333 |
| 7 | 99367 | 01072 | 1530.33 | 2014 | 21068.0 | MA | Shutesbury | Greenfield Town | Franklin County | 232787.58 | 17.885117 | 0.75 | 58001.0 | 0.727518 | 6.158333 | 1000.25 | 382868.333333 |
| 8 | 99368 | 01073 | 1106.99 | 2014 | 12153.0 | MA | Southampton | Springfield | Hampshire County | 277385.58 | 2.164863 | 0.75 | 58001.0 | 0.727518 | 6.158333 | 1000.25 | 382868.333333 |
| 9 | 99369 | 01074 | 1234.57 | 2014 | NaN | NaN | NaN | NaN | NaN | NaN | 14.285714 | 0.75 | 58001.0 | 0.727518 | 6.158333 | 1000.25 | 382868.333333 |
Last rows
| df_index | Zipcode | RentPrice | Year | SizeRank | State | City | Metro | CountyName | HomePrice | Vacancy_Rate% | int_rate | med_hIncome | uspop_growth | unemplt_rate | newHouse_starts | resConstruct_spending | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 165590 | 264950 | 98134 | 1909.58 | 2018 | 28159.0 | WA | Seattle | Seattle-Tacoma-Bellevue | King County | 438970.00 | 13.580247 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 165591 | 264951 | 98174 | NaN | 2018 | NaN | NaN | NaN | NaN | NaN | NaN | 0.000000 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 165592 | 264952 | 98222 | NaN | 2018 | 30981.0 | WA | Olga | NaN | San Juan County | 580646.58 | 83.471074 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 165593 | 264953 | 98233 | 1413.89 | 2018 | 7640.0 | WA | Burlington | Mount Vernon-Anacortes | Skagit County | 317426.75 | 4.853765 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 165594 | 264954 | 98243 | 1302.94 | 2018 | NaN | NaN | NaN | NaN | NaN | NaN | 57.293233 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 165595 | 264955 | 98279 | 1059.87 | 2018 | 23400.0 | WA | Olga | NaN | San Juan County | 552805.42 | 51.219512 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 165596 | 264956 | 98280 | 993.85 | 2018 | 25265.0 | WA | Eastsound | NaN | San Juan County | 678499.00 | 51.329243 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 165597 | 264957 | 98311 | 1533.50 | 2018 | 4981.0 | WA | Bremerton | Bremerton-Silverdale | Kitsap County | 314320.83 | 6.540162 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 165598 | 264958 | 98326 | 778.99 | 2018 | 26185.0 | WA | Clallam Bay | Port Angeles | Clallam County | 150193.17 | 28.537736 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |
| 165599 | 264959 | 98332 | 1840.86 | 2018 | 6759.0 | WA | Gig Harbor | Seattle-Tacoma-Bellevue | Pierce County | 535136.75 | 7.340077 | 2.458333 | 64324.0 | 0.522337 | 3.891667 | 1248.25 | 564448.75 |